docs: Clarify CLI help text to align with databus hierarchy #47

vaibhav45sktech · 2026-01-04T07:41:38Z

Description-

This PR updates the CLI help text for --title, --abstract, and --description to align with the DBpedia databus data model hierarchy.

Currently, the CLI uses generic "Dataset" terminology, which causes confusion between Artifacts (the logical container) and Versions (the specific release). Users often mistake --title for a version-specific name, or --abstract for a global description.

These changes clarify the scope of each argument without breaking existing CLI compatibility.

Changes

I have updated the help strings to explicitly map the arguments to their databus concepts:

Argument	Old Help Text	New Help Text
`--title`	`Dataset title`	`Artifact Label: the permanent name of the data series (applies to all versions)`
`--abstract`	`Dataset abstract max 200 chars`	`Version Abstract: a short summary (max 200 chars) specific to this timestamped release`
`--description`	`Dataset description`	`Version Description: detailed metadata for this specific release (supports Markdown)`

Solves Issue #21

Summary by CodeRabbit

Documentation

Updated help text descriptions for deploy command metadata options to provide clearer definitions and usage context.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-04T07:41:46Z

📝 Walkthrough

Walkthrough

Updated user-facing help text for deploy CLI options to clarify artifact-level vs version-level fields; renamed parameters in the API function create_dataset from title, abstract, description to artifact_title, artifact_abstract, artifact_description. No other functional logic changed.

Changes

Cohort / File(s)	Summary
CLI: deploy help text `databusclient/cli.py`	Revised `--title`, `--abstract`, and `--description` help strings to indicate artifact-level ("Artifact title"/"Artifact title: the permanent name of the data series (applies to all versions)") and version-level usage ("Version Abstract: a short summary (max 200 chars) specific to this timestamped release"; "Version Description: detailed metadata for this specific release (supports Markdown)").
API: parameter rename in create_dataset `databusclient/api/deploy.py`	Renamed function parameters `title`, `abstract`, `description` → `artifact_title`, `artifact_abstract`, `artifact_description` in `create_dataset(...)` signature; internal usage aligned with new names. No behavioral changes shown.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers the main changes with a clear explanation and helpful table, but the required checklist from the template is entirely missing.	Add the complete PR checklist from the template including change type selection, code style confirmation, review steps, and test verification status.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: updating CLI help text to align with databus hierarchy terminology, matching the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aca668e and 3838bdb.

📒 Files selected for processing (2)

databusclient/api/deploy.py
databusclient/cli.py

🚧 Files skipped from review as they are similar to previous changes (1)

databusclient/cli.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

databusclient/cli.py (2)
14-56: Consider several improvements to the parsing logic.

Warning mechanism (Line 49): Using print() for warnings writes directly to stdout, which is not ideal for library code. Consider using Python's logging module or collecting warnings in the return value for better testability and control.

Compression heuristic is fragile (Lines 36-39): The classification of .tar as compression is questionable since .tar is often treated as an archive format, not a compression format. Files like .tar.gz have both archiving and compression. This heuristic may misclassify extensions and lead to incorrect metadata.

Missing docstring examples: The docstring should include concrete examples showing input and expected output to clarify the expected format and behavior.

No validation of key=value pairs (Line 43): The code splits on '=' but doesn't validate that the key or value is non-empty. Consider adding validation to reject malformed variants like "=value" or "key=".

Inconsistent extension handling: Line 37's comment suggests conditional removal of the leading dot ("if needed"), but the code always strips it. Clarify the expected format or document why the leading dot is always removed.
🔎 Suggested improvements

Replace print() with proper logging:
+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_distribution_str(dist_str: str):
     """
     Parses a distribution string with format:
     URL|key=value|...|.extension
     
     Returns a dictionary suitable for the deploy API.
+    
+    Examples:
+        >>> parse_distribution_str("http://example.com/data.json|lang=en|.json|.gz")
+        {'url': 'http://example.com/data.json', 'variants': {'lang': 'en'}, 
+         'formatExtension': 'json', 'compression': 'gz'}
     """
     ...
         else:
-             print(f"WARNING: Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")
+             logger.warning(f"Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")
Add validation for key=value pairs:
         elif '=' in part:
             key, value = part.split('=', 1)
+            if not key.strip() or not value.strip():
+                logger.warning(f"Invalid key=value pair '{part}' in distribution.")
+                continue
             variants[key.strip()] = value.strip()
129-137: Remove developer scaffolding comments and consider error handling.

Lines 129, 137: The "--- CHANGE START/END ---" comments are developer scaffolding and should be removed before merging.

Line 131: Consider adding error handling around parse_distribution_str calls. If parsing fails (e.g., malformed distribution string), the user should receive a clear error message rather than an uncaught exception.

Behavioral change: This modification changes the contract between the CLI and api_deploy.create_dataset. Previously, raw strings were passed; now parsed dictionaries are passed. While this appears to be intentional and supported by changes in deploy.py, it's worth verifying that all downstream consumers can handle this change.
🔎 Proposed cleanup
-        # --- CHANGE START ---
-        # Parse the input strings into structured objects
         parsed_distributions = [parse_distribution_str(d) for d in distributions]
         
-        # Note: api_deploy.create_dataset now accepts this list of dicts
         dataid = api_deploy.create_dataset(
             version_id, title, abstract, description, license_url, parsed_distributions
         )
-        # --- CHANGE END ---
databusclient/api/queries.py (1)
54-93: Consider adding validation for malformed key=value pairs.

Similar to parse_distribution_str in cli.py, this function splits on '=' (Line 87) but doesn't validate that both key and value are non-empty. Consider adding a check to handle edge cases like "=value" or "key=".
🔎 Suggested validation
         if "=" in part:
             key, value = part.split("=", 1)
+            if not key.strip() or not value.strip():
+                # Skip malformed key=value pairs
+                continue
             variants[key.strip()] = value.strip()
tests/test_parse_distribution.py (1)
1-275: Excellent test coverage with minor gaps.

The test suite is comprehensive and well-structured:

Clear test organization with descriptive test names

Good coverage of parsing logic: URL extraction, variants, extensions, compression

Appropriate use of mocking to avoid network dependencies

Integration tests validate end-to-end behavior with the deploy API

Minor suggestions for additional test coverage:

Error handling: Consider adding tests for truly malformed inputs that might cause exceptions (e.g., None input, non-string input, extremely long strings).

Edge case - URLs with delimiter: Test behavior when the URL itself contains the '|' delimiter character (e.g., "http://example.com/file?param=a|b|.json"). The current implementation would split incorrectly.

Empty parts: Test distribution strings with empty parts between delimiters (e.g., "http://example.com/file||.json" with double pipe).
🔎 Example additional tests
def test_url_with_pipe_character(self):
    """Test URL containing pipe character in query string."""
    # This is an edge case that might need special handling
    result = parse_distribution_str("http://example.com/file?param=a|b")
    # Current implementation would incorrectly split on the pipe in the URL
    assert result["url"] == "http://example.com/file?param=a|b"

def test_empty_parts_between_delimiters(self):
    """Test handling of empty parts between delimiters."""
    result = parse_distribution_str("http://example.com/file||.json")
    # Should gracefully handle empty parts
    assert result["formatExtension"] == "json"

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2c3f1c and c6316e2.

📒 Files selected for processing (5)

databusclient/api/deploy.py
databusclient/api/queries.py
databusclient/cli.py
tests/test_deploy.py
tests/test_parse_distribution.py

🧰 Additional context used

🧬 Code graph analysis (2)

databusclient/cli.py (1)

databusclient/api/deploy.py (1)

create_dataset (304-465)

tests/test_parse_distribution.py (2)

databusclient/cli.py (1)

parse_distribution_str (14-56)

databusclient/api/deploy.py (1)

_get_file_info_from_dict (176-208)

🔇 Additional comments (4)

databusclient/cli.py (1)

73-75: LGTM! Help text improvements align with PR objectives.

The updated help text successfully clarifies the distinction between Artifact-level metadata (permanent name) and Version-level metadata (release-specific). The terminology now explicitly maps to the DBpedia Databus data model.

databusclient/api/deploy.py (1)

304-387: LGTM! Backwards-compatible API extension.

The extension of distributions parameter to accept Union[List[str], List[Dict]] is well-implemented:

Backwards compatible with existing string-based distributions

Clear documentation in the docstring (lines 334-337)

Appropriate use of isinstance() check to branch on format (line 368)

Both branches produce the same tuple structure for consistent downstream processing

The approach correctly maintains the existing validation (lines 389-393) for both formats.

tests/test_deploy.py (1)

15-16: LGTM! Good test hygiene.

Extracting the example URL into a named constant improves test maintainability and readability.

databusclient/api/queries.py (1)

1-51: This module introduces unused code unrelated to the PR's stated objective.

ONTOLOGIES_QUERY and parse_content_variants_string() are defined in this new module but are not imported or used anywhere in the codebase. Since the PR aims to clarify CLI help text, this feature addition appears out of scope.

Either:

Move this to a separate PR with proper documentation of the feature's purpose and integration points, or

Integrate it into the codebase (add imports where needed and update the PR description to reflect this feature work)

Additionally, document why these functions are being added if they're intended for future use.

databusclient/api/deploy.py

…s Version)

vaibhav45sktech · 2026-01-06T06:50:41Z

Greetings @Integer-Ctrl @m1ci could you please review this pr ?

vaibhav45sktech · 2026-01-09T07:20:56Z

Greetings @Integer-Ctrl @m1ci could you please review this pr ?

Kindly review it.

Integer-Ctrl

As you said, it is a good idea to keep the existing CLI arguments for now to avoid breaking changes. However, perhaps you could change the variable names in the method create_dataset in the deploy.py file:

title -> artifact_title
abstract -> artifact_abstract
description -> artifact_description

This would improve clarity.

databusclient/cli.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

databusclient/api/deploy.py (4)
287-296: Update docstring to match the new parameter names.

The docstring still documents the old parameter names (title, abstract, description) instead of the new names (artifact_title, artifact_abstract, artifact_description).
📝 Proposed fix for the docstring
     version_id: str
         The version ID representing the Dataset. Needs to be in the form of $DATABUS_BASE/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION
-    title: str
-        The title text of the dataset
-    abstract: str
-        A short (one or two sentences) description of the dataset
-    description: str
-        A long description of the dataset. Markdown syntax is supported
+    artifact_title: str
+        The artifact title (permanent name of the data series, applies to all versions)
+    artifact_abstract: str
+        The artifact abstract (a short description of the artifact)
+    artifact_description: str
+        The artifact description (detailed metadata for the artifact). Markdown syntax is supported
     license_url: str
         The license of the dataset as a URI.
501-507: Update the call to create_dataset with the new parameter names.

The deploy_from_metadata function still calls create_dataset using the old parameter names (title, abstract, description), which will cause a TypeError since those parameters no longer exist.
🔧 Proposed fix
     dataset = create_dataset(
         version_id=version_id,
-        title=title,
-        abstract=abstract,
-        description=description,
+        artifact_title=title,
+        artifact_abstract=abstract,
+        artifact_description=description,
         license_url=license_url,
         distributions=distributions,
     )
269-281: Inconsistency: PR description doesn't match the actual changes.

The PR description states: "The changes preserve CLI compatibility and only modify help strings." However, this file changes the public API function signature of create_dataset by renaming parameters, which is a breaking change for any code that calls this function with keyword arguments.

If the intent is truly to only update CLI help text without breaking the API, consider keeping the original parameter names in the function signature and only updating the CLI layer where argument parsing happens.

269-281: Critical: Incomplete parameter rename causes NameError and TypeError at runtime.

The function signature renames title, abstract, and description to artifact_title, artifact_abstract, and artifact_description, but the refactoring is incomplete:

Function body references undefined variables (lines 383-387, 393-405): The code still uses the old parameter names, which will raise NameError at runtime.

All callers use old parameter names:

deploy_from_metadata() (line 501)

databusclient/cli.py:84

tests/test_deploy.py:94

These will fail with TypeError (unexpected keyword arguments).

Docstring is outdated (lines 282-315): Still documents the old parameter names, creating API documentation mismatch.

Additionally, this is a breaking API change that contradicts the PR description's claim of "preserve CLI compatibility."
Required fixes

Update lines 383-387 and 393-405 to use the renamed parameters:
     artifact_graph = {
         "@id": artifact_id,
         "@type": "Artifact",
-        "title": title,
-        "abstract": abstract,
-        "description": description,
+        "title": artifact_title,
+        "abstract": artifact_abstract,
+        "description": artifact_description,
     }
     dataset_graph = {
         "@type": ["Version", "Dataset"],
         "@id": _versionId,
         "hasVersion": version,
-        "title": title,
-        "abstract": abstract,
-        "description": description,
+        "title": artifact_title,
+        "abstract": artifact_abstract,
+        "description": artifact_description,
         "license": license_url,
         "distribution": distribution_list,
     }
Update all callers in deploy_from_metadata(), databusclient/cli.py, and tests/test_deploy.py to use the new parameter names.

Update the docstring (lines 282-315) to reflect the new parameter names.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aca668e and 3838bdb.

📒 Files selected for processing (2)

databusclient/api/deploy.py
databusclient/cli.py

🚧 Files skipped from review as they are similar to previous changes (1)

databusclient/cli.py

coderabbitai bot reviewed Jan 4, 2026

View reviewed changes

databusclient/api/deploy.py Outdated Show resolved Hide resolved

docs: clarify CLI terminology to align with Databus model (Artifact v…

aca668e

…s Version)

vaibhav45sktech force-pushed the docs/clarify-cli-terminology branch from c6316e2 to aca668e Compare January 4, 2026 08:05

Integer-Ctrl self-requested a review January 10, 2026 11:00

Integer-Ctrl requested changes Jan 10, 2026

View reviewed changes

databusclient/cli.py Outdated Show resolved Hide resolved

required changes

3838bdb

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Clarify CLI help text to align with databus hierarchy #47

docs: Clarify CLI help text to align with databus hierarchy #47

Uh oh!

vaibhav45sktech commented Jan 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

vaibhav45sktech commented Jan 6, 2026 •

edited

Loading

Uh oh!

vaibhav45sktech commented Jan 9, 2026

Uh oh!

Integer-Ctrl left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: Clarify CLI help text to align with databus hierarchy #47

Are you sure you want to change the base?

docs: Clarify CLI help text to align with databus hierarchy #47

Uh oh!

Conversation

vaibhav45sktech commented Jan 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description-

Changes

Summary by CodeRabbit

Documentation

Uh oh!

coderabbitai bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vaibhav45sktech commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaibhav45sktech commented Jan 9, 2026

Uh oh!

Integer-Ctrl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vaibhav45sktech commented Jan 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 4, 2026 •

edited

Loading

vaibhav45sktech commented Jan 6, 2026 •

edited

Loading